Skip to content

Conversation

@quantumsteve
Copy link
Contributor

I noticed that the batched eigensolver function CUSOLVER.heevjBatched! accepts a 3-dimensional StridedCuArray while the similar function CUSOLVER.XsyevBatched! accepts a 2-dimensional StridedCuMatrix. I think having the same interface for both would be desirable.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 28, 2025

Your PR no longer requires formatting changes. Thank you for your contribution!

Signed-off-by: Steven Hahn <hahnse@ornl.gov>
Signed-off-by: Steven Hahn <hahnse@ornl.gov>
@kshyatt kshyatt force-pushed the syevbatched_cuarray branch from 258eefe to e3dbe9b Compare November 13, 2025 16:14
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: e3dbe9b Previous: 2e983fe Ratio
latency/precompile 56489805251.5 ns 56427085830.5 ns 1.00
latency/ttfp 8293952143.5 ns 8362501410 ns 0.99
latency/import 4494056576 ns 4521778039 ns 0.99
integration/volumerhs 9623518.5 ns 9624952.5 ns 1.00
integration/byval/slices=1 147182 ns 146870 ns 1.00
integration/byval/slices=3 425892.5 ns 425790 ns 1.00
integration/byval/reference 145072.5 ns 144866 ns 1.00
integration/byval/slices=2 286243 ns 286021 ns 1.00
integration/cudadevrt 103470 ns 103323 ns 1.00
kernel/indexing 14044 ns 14090 ns 1.00
kernel/indexing_checked 14727 ns 14977.5 ns 0.98
kernel/occupancy 669.2784810126582 ns 670.5886075949367 ns 1.00
kernel/launch 2196.8888888888887 ns 2115.8 ns 1.04
kernel/rand 14816 ns 16842 ns 0.88
array/reverse/1d 19812.5 ns 19633 ns 1.01
array/reverse/2dL_inplace 66690 ns 66698 ns 1.00
array/reverse/1dL 69959.5 ns 69881 ns 1.00
array/reverse/2d 21801 ns 21367 ns 1.02
array/reverse/1d_inplace 9785 ns 9601 ns 1.02
array/reverse/2d_inplace 13177 ns 13220 ns 1.00
array/reverse/2dL 73804 ns 73483 ns 1.00
array/reverse/1dL_inplace 66861 ns 66751 ns 1.00
array/copy 20434 ns 20712 ns 0.99
array/iteration/findall/int 156396 ns 156846 ns 1.00
array/iteration/findall/bool 139124 ns 139935.5 ns 0.99
array/iteration/findfirst/int 161010.5 ns 160606 ns 1.00
array/iteration/findfirst/bool 162133 ns 161405 ns 1.00
array/iteration/scalar 72390.5 ns 72218 ns 1.00
array/iteration/logical 215485 ns 215761.5 ns 1.00
array/iteration/findmin/1d 49862 ns 49669 ns 1.00
array/iteration/findmin/2d 96114.5 ns 96275.5 ns 1.00
array/reductions/reduce/Int64/1d 42907 ns 43492 ns 0.99
array/reductions/reduce/Int64/dims=1 44354 ns 44664.5 ns 0.99
array/reductions/reduce/Int64/dims=2 61379 ns 61641 ns 1.00
array/reductions/reduce/Int64/dims=1L 88845 ns 88640 ns 1.00
array/reductions/reduce/Int64/dims=2L 87880 ns 87635.5 ns 1.00
array/reductions/reduce/Float32/1d 36503 ns 36681 ns 1.00
array/reductions/reduce/Float32/dims=1 47658 ns 48806 ns 0.98
array/reductions/reduce/Float32/dims=2 59568 ns 59459 ns 1.00
array/reductions/reduce/Float32/dims=1L 52285 ns 52065 ns 1.00
array/reductions/reduce/Float32/dims=2L 72078 ns 71664 ns 1.01
array/reductions/mapreduce/Int64/1d 43149 ns 43256 ns 1.00
array/reductions/mapreduce/Int64/dims=1 44780 ns 44863 ns 1.00
array/reductions/mapreduce/Int64/dims=2 61149 ns 61500 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 88670 ns 88638 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88056.5 ns 87897.5 ns 1.00
array/reductions/mapreduce/Float32/1d 36578 ns 36277.5 ns 1.01
array/reductions/mapreduce/Float32/dims=1 41404.5 ns 41259 ns 1.00
array/reductions/mapreduce/Float32/dims=2 59539.5 ns 59440 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52439 ns 52331.5 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 71900 ns 71656.5 ns 1.00
array/broadcast 19823 ns 19817 ns 1.00
array/copyto!/gpu_to_gpu 11275 ns 11436 ns 0.99
array/copyto!/cpu_to_gpu 213575 ns 215179 ns 0.99
array/copyto!/gpu_to_cpu 282375 ns 282618 ns 1.00
array/accumulate/Int64/1d 124207 ns 124273 ns 1.00
array/accumulate/Int64/dims=1 83056 ns 83182 ns 1.00
array/accumulate/Int64/dims=2 157765.5 ns 157485 ns 1.00
array/accumulate/Int64/dims=1L 1709258.5 ns 1709450 ns 1.00
array/accumulate/Int64/dims=2L 966565 ns 966304 ns 1.00
array/accumulate/Float32/1d 108669 ns 108932 ns 1.00
array/accumulate/Float32/dims=1 80172 ns 80065 ns 1.00
array/accumulate/Float32/dims=2 146909 ns 146929 ns 1.00
array/accumulate/Float32/dims=1L 1618657 ns 1618534.5 ns 1.00
array/accumulate/Float32/dims=2L 697757 ns 697506 ns 1.00
array/construct 1284.5 ns 1270.6 ns 1.01
array/random/randn/Float32 44191 ns 47947 ns 0.92
array/random/randn!/Float32 24824 ns 24918 ns 1.00
array/random/rand!/Int64 27206 ns 27167 ns 1.00
array/random/rand!/Float32 8765 ns 8884.333333333334 ns 0.99
array/random/rand/Int64 29627 ns 37695.5 ns 0.79
array/random/rand/Float32 12851 ns 12943 ns 0.99
array/permutedims/4d 55677 ns 59797.5 ns 0.93
array/permutedims/2d 53547 ns 53660 ns 1.00
array/permutedims/3d 54554 ns 54666 ns 1.00
array/sorting/1d 2757157 ns 2757791.5 ns 1.00
array/sorting/by 3343841 ns 3344326 ns 1.00
array/sorting/2d 1080788 ns 1080588 ns 1.00
cuda/synchronization/stream/auto 1033.0833333333333 ns 1040 ns 0.99
cuda/synchronization/stream/nonblocking 7428.8 ns 6879.299999999999 ns 1.08
cuda/synchronization/stream/blocking 826.0961538461538 ns 805.0612244897959 ns 1.03
cuda/synchronization/context/auto 1180.9 ns 1175.2 ns 1.00
cuda/synchronization/context/nonblocking 7994.2 ns 7439.7 ns 1.07
cuda/synchronization/context/blocking 904.6071428571429 ns 896.560975609756 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@kshyatt
Copy link
Member

kshyatt commented Nov 14, 2025

Test fail is due to #2971, unrelated to this PR

Copy link
Member

@kshyatt kshyatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although updating the error messages would be nice

Signed-off-by: Steven Hahn <hahnse@ornl.gov>
@maleadt maleadt merged commit 272c6ef into JuliaGPU:master Nov 14, 2025
2 of 3 checks passed
@quantumsteve quantumsteve deleted the syevbatched_cuarray branch November 14, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants